Search CORE

10 research outputs found

Certified Reinforcement Learning with Logic Guidance

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 10/02/2020
Field of study

This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782

arXiv.org e-Print Archive

Cautious Reinforcement Learning with Logical Constraints

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Policies are synthesised to satisfy a goal, expressed as a temporal logic formula, with maximal probability. Enforcing the RL agent to stay safe during learning might limit the exploration, however we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration (towards goal satisfaction) and ensuring safety. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm. Experimental results are provided to showcase the performance of the proposed method.Comment: Accepted to AAMAS 2020. arXiv admin note: text overlap with arXiv:1902.0077

arXiv.org e-Print Archive

Oxford University Research Archive

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Jeppu Natasha Yogananda
Kroening Daniel
Melham Tom
Publication venue
Publication date: 01/01/2021
Field of study

This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes the Atari game Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of two orders of magnitude in the number of iterations required for policy synthesis, and also a significant improvement in scalability.Comment: Extended version of AAAI 2021 pape

arXiv.org e-Print Archive

Oxford University Research Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multi-agent Learning in Coverage Control Games

Author: Hasanbeig Mohammadhosein
Publication venue
Publication date: 01/11/2016
Field of study

Multi-agent systems have found a variety of industrial applications from economics to robotics. With the increasing complexity of multi-agent systems, multi-agent control has become a challenging problem in many areas. While studying multi-agent systems is not identical to studying game theory, there is no doubt that game theory can be a key tool to manage such complex systems. Game theoretic multi-agent learning is one of relatively new solutions to the complex problem of multi-agent control. In such learning scheme, each agent eventually discovers a solution on his own. The main focus of this thesis is on enhancement of multi-agent learning in game theory and its application in multi-robot control. Each algorithm proposed in this thesis, relaxes and imposes different assumptions to fit a class of multi-robot learning problems. Numerical experiments are also conducted to verify each algorithm's robustness and performance.M.A.S

University of Toronto Research Repository

Towards verifiable and safe model-free reinforcement learning

Author: Abate Alessandro
Hasanbeig Mohammadhosein
Kroening Daniel
Publication venue: CEUR Workshop Proceedings
Publication date: 03/03/2020
Field of study

Reinforcement Learning (RL) is a widely employed machine learning architecture that has been applied to a variety of decision-making problems, from resource management to robot locomotion, from recommendation systems to systems biology, and from traffic control to superhuman-level gaming. However, RL has experienced limited success beyond rigidly controlled or constrained applications, and successful employment of RL in safety-critical scenarios is yet to be achieved. A principal reason for this limitation is the lack of formal approaches to specify requirements as tasks and learning constraints, and to provide guarantees with respect to these requirements and constraints, during and after learning. This line of work addresses these issues by proposing a general framework that leverages the success of RL in learning high-performance controllers, while guaranteeing the satisfaction of given requirements and guiding the learning process within safe configurations

Shielding atari games with bounded prescience

Author: Giacobbe M.
Hasanbeig Mohammadhosein
Kroening Daniel
Wijk Hjalmar
Publication venue: 'Test accounts'
Publication date: 22/01/2021
Field of study

Deep reinforcement learning (DRL) is applied in safety-critical domains such as robotics and autonomous driving. It achieves superhuman abilities in many tasks, however whether DRL agents can be shown to act safely is an open problem. Atari games are a simple yet challenging exemplar for evaluating the safety of DRL agents and feature a diverse portfolio of game mechanics. The safety of neural agents has been studied before using methods that either require a model of the system dynamics or an abstraction; unfortunately, these are unsuitable to Atari games because their low-level dynamics are complex and hidden inside their emulator. We present the first exact method for analysing and ensuring the safety of DRL agents for Atari games. Our method only requires access to the emulator. First, we give a set of 43 properties that characterise "safe behaviour" for 30 games. Second, we develop a method for exploring all traces induced by an agent and a game and consider a variety of sources of game non-determinism. We observe that the best available DRL agents reliably satisfy only very few properties; several critical properties are violated by all agents. Finally, we propose a countermeasure that combines a bounded explicit-state exploration with shielding. We demonstrate that our method improves the safety of all agents over multiple properties.Comment: To appear at AAMAS 202

arXiv.org e-Print Archive

University of Birmingham Research Portal